Skip to content

test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity#1154

Merged
carlos-alm merged 4 commits into
mainfrom
feat/1121-language-registry-native-parity
May 19, 2026
Merged

test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity#1154
carlos-alm merged 4 commits into
mainfrom
feat/1121-language-registry-native-parity

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Closes #1121.

The acceptance criterion from #1071 called for a CI gate that fires when a JS LANGUAGE_REGISTRY entry has no corresponding Rust extractor (or sits on an explicit allowlist). The existing drift guard in tests/parsers/native-drop-classification.test.ts covers NATIVE_SUPPORTED_EXTENSIONS ↔ parser_registry.rs (link 2 ↔ 3), but the link from LANGUAGE_REGISTRY to NATIVE_SUPPORTED_EXTENSIONS (link 1 ↔ 2) had no test. A WASM-only language added to the registry would silently be classified as unsupported-by-native and only the JS backfill would parse it — exactly the regression class #1071 was meant to prevent.

Adds a new describe block (LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity) with three tests:

  1. Every LANGUAGE_REGISTRY extension has a Rust extractor (or appears on the documented WASM_ONLY_ALLOWLIST).
  2. Allowlist entries can't reference extensions that already have a Rust extractor (catches stale entries left behind after a Rust port).
  3. Allowlist entries must reference extensions actually declared in LANGUAGE_REGISTRY (catches typos and dead config).

WASM_ONLY_ALLOWLIST is an empty Set<string> today, documented in-code so future contributors know when to add to it (intentional WASM-only languages with a tracked Rust port) vs. when to port directly.

Test plan

  • npx vitest run tests/parsers/native-drop-classification.test.ts — 16/16 passing (13 existing + 3 new).
  • Verified the parity logic catches simulated drift (added .toy to a registry copy and confirmed the detector flagged it) before removing the throwaway test.
  • npx biome check tests/parsers/native-drop-classification.test.ts — clean.
  • CI green.

…arity

Closes #1121.

The drift guard between NATIVE_SUPPORTED_EXTENSIONS and parser_registry.rs
covered link 2 ↔ 3, but link 1 ↔ 2 (LANGUAGE_REGISTRY ↔
NATIVE_SUPPORTED_EXTENSIONS) had no test. A WASM-only language added to
the registry would silently degrade the native engine without flagging.

Adds three tests with an explicit WASM_ONLY_ALLOWLIST (currently empty) so
the allowlist itself can't rot — entries must reference a real registry
extension and must not duplicate a language that's already been ported.
@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Claude finished @carlos-alm's task —— View job


PR Review - Final Assessment

Reviewing PR #1154: test(parsers): gate LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS parity

Todo List

  • Read the modified test file to understand implementation
  • Analyze against Phase 1: Root Cause Analysis
  • Analyze against Phase 2: Configuration Change Detection
  • Analyze against Phase 3: Test Integrity
  • Analyze against Phase 4: Code Quality
  • Analyze against Phase 5: Backlog Compliance
  • Provide final recommendation and rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Real issue from #1121 - there was a gap in drift detection between the JS LANGUAGE_REGISTRY (single source of truth for supported languages) and NATIVE_SUPPORTED_EXTENSIONS (hand-maintained mirror of Rust enum coverage). A WASM-only language could be silently added to the registry without a corresponding Rust extractor, causing the native engine to degrade silently without triggering CI warnings.

Why This Approach: Justified architectural fix. The existing drift guard only covered NATIVE_SUPPORTED_EXTENSIONS ↔ parser_registry.rs (link 2↔3), but the critical link 1↔2 (LANGUAGE_REGISTRYNATIVE_SUPPORTED_EXTENSIONS) was unguarded. This would allow regressions exactly as described in #1071.

Risk Assessment: Low risk - pure additive testing with well-defined allowlist mechanism for intentional exceptions.

Phase 2: Configuration Change Detection ✅

HIGH-RISK Changes: None detected. This PR only adds tests - no linting rules disabled, no timeouts increased, no build configuration modified, no TypeScript strict mode changes.

Assessment: Clean - purely additive test coverage with no configuration relaxation.

Phase 3: Test Integrity ✅

RED FLAGS: None detected. The tests are properly structured and strict:

Strong assertions: Tests use .toEqual([]) which fails on any unexpected entries
Comprehensive coverage: Three complementary tests cover all drift scenarios
No weakened expectations: Tests maintain strict parity requirements
Clear documentation: Extensive comments explain the architectural purpose

Test Quality Assessment:

  • Test 1 (lines 239-256): Ensures every LANGUAGE_REGISTRY extension either has native support OR is explicitly allowlisted
  • Test 2 (lines 258-266): Prevents stale allowlist entries after Rust ports are completed
  • Test 3 (lines 268-282): Catches typos and orphaned allowlist entries

All tests are properly isolated and test exactly what they claim to test.

Phase 4: Code Quality ✅

Implementation Quality: Excellent

Clean TypeScript: Proper typing with ReadonlySet<string>, correct imports
Logical structure: Tests follow natural dependency order (parity → staleness → orphans)
Error messages: Comprehensive error messages with actionable guidance (lines 250-254, 263-264, 279-280)
Code organization: Well-integrated into existing test structure
Documentation: Extensive inline documentation explaining architectural context

Minor Code Quality Notes:

  • Line 237: Empty allowlist is intentional and well-documented
  • Line 243: Proper case normalization with .toLowerCase()
  • Lines 246-249: Efficient filtering chains that are readable

Security/Performance: No concerns - pure test code with no external dependencies or side effects.

Phase 5: Backlog Compliance ✅

Zero-dep: ✓ — No new runtime dependencies added. Uses existing test infrastructure (vitest, existing imports)

Foundation-aligned: ✓ — Fully aligned. Strengthens the dual-engine architecture (P1: always-current) by ensuring native engine coverage matches the JS registry. No cloud APIs, external DBs, or agent behavior.

Problem-fit: 5/5 — Directly prevents AI agent confusion from inconsistent engine results. When the native engine silently degrades due to missing extractors, agents receive inconsistent dependency graphs depending on engine selection, leading to hallucinated dependencies and incorrect blast radius analysis. This gate prevents exactly that scenario.

Breaking: No — Purely additive test coverage with no behavior changes to public APIs, CLI output, or database schema.

Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None. This is a high-quality defensive improvement that strengthens the core architecture.

Final Recommendation

Root Cause Analysis

Problem Being Solved: CI gap allowing silent native engine degradation via WASM-only language additions
Why This Approach: Closes architectural drift detection gap with proper allowlist mechanism
Risk Assessment: Very low - additive testing only

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies
  • Foundation-aligned: ✓ — Strengthens dual-engine architecture
  • Problem-fit: 5 — Directly prevents AI agent confusion from engine inconsistency
  • Breaking: No — Purely additive
  • Tier: 1

Critical Concerns

None

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: Exemplary defensive programming. Addresses real architectural vulnerability with comprehensive test coverage, clear documentation, and proper escape hatch design. The allowlist mechanism is well-designed for the rare case where WASM-only support is temporarily needed. This directly improves AI agent reliability by preventing silent engine parity drift.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

Adds a new describe block to native-drop-classification.test.ts that gates parity between LANGUAGE_REGISTRY (JS source of truth) and NATIVE_SUPPORTED_EXTENSIONS (the hand-maintained Rust mirror), closing the previously untested link 1\u21942 identified in #1121.

  • Three interlocking tests enforce that every LANGUAGE_REGISTRY extension has a Rust extractor or an allowlist exemption, that allowlist entries are not stale, and that they reference real registry extensions.
  • WASM_ONLY_ALLOWLIST is an empty in-code Set<string> with inline guidance for future contributors, intentionally requiring a deliberate opt-out rather than silence on WASM-only additions.
  • The previous case-normalisation issues on tests 2 and 3 are already applied; minor duplication of registryExts construction remains between tests 1 and 3, and the reverse direction check is absent.

Confidence Score: 5/5

Test-only addition with no production code changes; all three new invariants are logically sound.

The change touches only a test file, adds no new production paths, and the parity logic is correct. The two observations are style and completeness suggestions that do not compromise correctness.

No files require special attention; the single changed file is a test-only addition.

Important Files Changed

Filename Overview
tests/parsers/native-drop-classification.test.ts Adds a new describe block with three tests gating LANGUAGE_REGISTRY parity via an empty WASM_ONLY_ALLOWLIST escape hatch; logic is correct with minor duplication and one missing reverse-direction check.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    LR["LANGUAGE_REGISTRY\n(JS source of truth)"]
    NSE["NATIVE_SUPPORTED_EXTENSIONS\n(JS hand-maintained mirror)"]
    PR["parser_registry.rs\n(Rust implementation)"]
    AL["WASM_ONLY_ALLOWLIST\n(explicit exemptions)"]

    LR -- "Test 1: every ext must land here" --> NSE
    LR -- "or here (Test 1 exemption)" --> AL
    AL -- "Test 2: must NOT already be here" --> NSE
    AL -- "Test 3: must exist in" --> LR
    NSE -- "Existing drift guard (bidirectional)" --> PR
Loading

Fix All in Claude Code

Reviews (5): Last reviewed commit: "Merge branch 'main' into feat/1121-langu..." | Re-trigger Greptile

it('WASM_ONLY_ALLOWLIST does not list extensions that already have a Rust extractor', () => {
// Catches stale allowlist entries: once a language is ported to Rust the
// allowlist line should be deleted, not left behind as dead config.
const stale = [...WASM_ONLY_ALLOWLIST].filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Allowlist entries in tests 2 and 3 are compared as-is, while registryExts (used in test 1 and test 3) and NATIVE_SUPPORTED_EXTENSIONS are all lowercase. If a future contributor adds a mixed-case entry such as .TS to the allowlist, test 2 would silently miss it as stale (NATIVE_SUPPORTED_EXTENSIONS.has('.TS') → false) and test 3 would wrongly report it as an orphan (registryExts.has('.TS') → false because registryExts was built with .toLowerCase()). Normalising allowlist entries to lowercase in these two filters keeps all three tests consistent.

Suggested change
const stale = [...WASM_ONLY_ALLOWLIST].filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));
const stale = [...WASM_ONLY_ALLOWLIST]
.map((ext) => ext.toLowerCase())
.filter((ext) => NATIVE_SUPPORTED_EXTENSIONS.has(ext));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57eb769 — added .map((ext) => ext.toLowerCase()) before the NATIVE_SUPPORTED_EXTENSIONS filter so a mixed-case allowlist entry can't slip past the staleness check.

registryExts.add(ext.toLowerCase());
}
}
const orphans = [...WASM_ONLY_ALLOWLIST].filter((ext) => !registryExts.has(ext));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same normalisation gap as in test 2 — without .toLowerCase() here, a capitalised allowlist entry would be falsely reported as an orphan even though a matching (lowercase) extension exists in registryExts.

Suggested change
const orphans = [...WASM_ONLY_ALLOWLIST].filter((ext) => !registryExts.has(ext));
const orphans = [...WASM_ONLY_ALLOWLIST]
.map((ext) => ext.toLowerCase())
.filter((ext) => !registryExts.has(ext));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 57eb769 — added .map((ext) => ext.toLowerCase()) before the registryExts filter so the orphan check matches the lowercase form actually stored in registryExts.

Lowercase allowlist entries before comparing against `registryExts` and
`NATIVE_SUPPORTED_EXTENSIONS` so a mixed-case future entry can't slip
past the staleness and orphan checks. Both reference sets are already
lowercased, so without this normalization a capitalized allowlist entry
would be silently skipped by test 2 and falsely flagged by test 3.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit d255297 into main May 19, 2026
21 checks passed
@carlos-alm carlos-alm deleted the feat/1121-language-registry-native-parity branch May 19, 2026 01:33
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CI gate: LANGUAGE_REGISTRY ↔ NATIVE_SUPPORTED_EXTENSIONS consistency

1 participant